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Abstract. Developmental instability provides a powerful monitoring tool to detect threats prior to population declines. 
Consequently, assessing the level of developmental instability by measuring fluctuating asymmetry (FA) of bilaterally sym¬ 
metrical traits in association with environmental stress has become increasingly attractive. However, many studies failed in 
detecting a clear connection of FA to environmental stressors. Some of these may have suffered from large measurement 
error (ME) or the use of inappropriate methods. Here, we compared measurement accuracy and FA outcome from manual 
calliper measurements with those from non-destructive micro- 3 D-computed tomography (pCT) based skeletal measure¬ 
ments. Amphibians are assumed to be ideal models for measuring fluctuating asymmetry due to their sensitivity to envi¬ 
ronmental stress. For our study, we chose two bilateral, metric traits (femur and radio-ulna length) of the European Com¬ 
mon Frog, Rana temporaria. Calliper measurements revealed meaningful FA estimates (i.e., FA exceeded ME) for radio¬ 
ulna length only. In contrast, pCT-based measurements delivered meaningful FA estimates for both traits. ME was about 
twice as high for calliper measurements compared to pCT-based measurements, resulting in inflated levels of FA. Using 
callipers, we observed higher ME for femur measurements than for radio-ulna, meaning that ME strongly depended on 
the respective trait. When using pCT, however, we observed comparable ME between both traits. Our study revealed that 
analyses of developmental instability using manual measurements should be treated with caution. For smaller vertebrates 
we recommend skeletal measurements with pCT as a valuable alternative due to its greater reliability, thereby allowing for 
multi-trait analyses with equal accuracy. 

Key words. Agreement, amphibians, Bland-Altman, developmental stability, environmental stress, measurement error, 
Micro CT, morphology, Rana temporaria , traits. 


Introduction 

Humans are altering landscapes all over the planet, and it 
is becoming increasingly clear that this causes many un¬ 
welcome effects that should be avoided wherever possible. 
A precondition to avoid or reduce such effects is to moni¬ 
tor animal and plant populations facing the risk of envi¬ 
ronmental change. By timely starting necessary counter¬ 
actions, one then can try to maintain populations’ health 
prior to severe and irreversible declines. In this respect, 
developmental stability has been proposed as a sensitive 
indicator of population health (Clarke 1993, Freeman 
et al. 1996, Jones 1987, Lens et al. 2002b). Developmen¬ 
tal stability, defined as the ability to develop the same phe¬ 
notype irrespective of different environmental conditions 
(Zakharov et al. 1991), represents the ability to resist de¬ 
velopmental accidents (Van Valen 1962) or imprecisions 
in developmental processes, also termed developmental 


noise (Waddington 1957). Decreases in developmental 
stability or increases of developmental noise through ge¬ 
netic or environmental disturbances result in developmen¬ 
tal instability (Clarke 1995, Lens et al. 2002a, Palmer & 
Strobeck 1992). 

The level of developmental instability is most common¬ 
ly assessed by measuring the degree of fluctuating asym¬ 
metry (FA) (Mather 1953, Moller & Swaddle 1997, 
Palmer & Strobeck 1986). FA are small, random devia¬ 
tions from the symmetry of bilaterally symmetrical traits 
(Ludwig 1932) with symmetrically distributed right-mi- 
nus-left (R-L) differences about a mean of zero (Graham 
et al. 2010, Palmer & Strobeck 1986). However, FA must 
be distinguished from two other types of asymmetry i.e., 
directional asymmetry and antisymmetry (Van Valen 
1962). Directional asymmetry occurs when a trait within 
a population is consistently larger on one particular side 
of the body, and contrary to FA, has a mean that is signifi- 
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cantly different from zero and is skewed to either the left or 
the right side. Antisymmetry occurs when a trait is usually 
larger on one side of the body, but the side of which is vari¬ 
able, having a mean of zero but a bimodal or platykurtic 
(i.e., broad peaked) distribution (Moller & Swaddle 1997, 
Palmer & Strobeck 1986, Van Valen 1962). Directional 
asymmetry and antisymmetry are generally considered to 
be inappropriate for estimating developmental instability 
due to their presumed heritable component (Knierim et al. 
2007). Although there might be transitions from FA to the 
other asymmetry types (Graham et al. 1993, Lens & Van 
Dongen 2000), and the genetic basis of FA is still under 
debate (Leamy et al. 2015, Leamy & Klingenberg 2005), it 
is recommended to first test for directional asymmetry and 
antisymmetry and, if present, best avoid these traits (Gra¬ 
ham et al. 1998, Van Dongen 2006). FA proved to have the 
potential to serve as a tool to detect environmental stress in 
all bilaterally symmetrical taxa, such as e.g. insects (Beas¬ 
ley et al. 2013, Schmeller et al. 2011), fish (Leary & Al- 
lendorf 1989, Valentine et al. 1973), amphibians (Costa 
& Nomura 2015, Soderman et al. 2007), reptiles (Lazic 
et al. 2013, Sarre 1996), birds (Anciaes & Marini 2000, 
Lens & Eggermont 2008), and mammals (Marchand et 
al. 2003). However, there is the risk of‘false positives’ when 
measurement error (ME) is not considered (Floate & 
Coghlin 2010, Hoffmann & Woods 2003). Many studies 
failed in detecting a clear connection of FA with environ¬ 
mental stressors, possibly, at least in part, because they suf¬ 
fered from inappropriate methods and statistics (Knierim 
et al. 2007, Palmer & Strobeck 2003a). Thus, increasing 
measurement accuracy (i.e., minimizing ME) by the appli¬ 
cation of more sensitive methods might be a solution for 
the inconsistency of many FA results (Beasley et al. 2013, 
Merila & Bjorklund 1995). 

In our study, we compare one commonly used manual 
method and a computerized approach in terms of meas¬ 
urement accuracy and FA outcome. We considered two bi¬ 
lateral metric traits (femur and radio-ulna length) of the 
European Common Frog, Rana temporaria Linnaeus, 1758, 
and compared levels of ME and FA from external calliper 
measurements with those from non-destructive micro- 
3D-computed tomography (pCT) based skeletal measure¬ 
ments. We aim at clarifying the applicability and particu¬ 
larly the reliability of these two methods for analysing de¬ 
velopmental instability. In addition to our own data collec¬ 
tion and analyses, we also summarized the currently avail¬ 
able literature on FA in amphibians in order to get an over¬ 
view of methods applied and detect possible method de¬ 
pendent FA outcome (see Appendix A for full summary). 

Materials and methods 

Samples 

The correlation between environmental stress and FA is 
believed to be particularly pronounced in amphibians due 
to their physiology, semi-permeable skin, and mostly bi- 
phasic life cycles, which results in high susceptibility to 


environmental disturbances (Duellman & Trueb 1994, 
Ouellet et al. 1997, Wright & Zamudio 2002). As such, 
amphibians represent a very suitable model system for in¬ 
vestigating FA variation between different methods. For 
our study twenty intact (no broken bones) adult ethanol- 
preserved specimens of Rana temporaria from the Berlin- 
Brandenburg region, Germany, were obtained from the 
collection of the Museum fur Naturkunde Berlin (ZMB; 
Supplementary Table Si). Only adults were included in our 
study. A Snout-vent length of at least 5 cm was used as a 
criterion to define adults (Dittrich et al. 2018, Miaud et 
al. 1999). 

Skeletal measurements with pCT 

For pCT scanning, whole preserved frogs were removed 
from ethanol, wrapped in bubble wrap, and transferred to a 
dry plastic tube. Images were generated using a Phoenix|X- 
ray nanotom of the company GE Sensing & Inspection 
Technologies GmbH at 90 kV and 150 pA with fast scan 
settings for upper and lower body scans, acquiring 1000 
projections per scan. Effective voxel size ranged between 
19-21 pm for each scan. Volumetric reconstructions were 
made in Datos|x-reconstruction software (GE Sensing & 
Inspection Technologies GmbH, Wunstorf, Germany). 
The femora (from the medial condyle to the femur head) 
and radio-ulnae (from the olecranon process to the styloid 
process of the ulna) of right and left side of each individu¬ 
al were measured in VG Studio Max 3.0 with the distance 
measurement tool (Fig. 1 A-C) resulting in 40 measure¬ 
ments of femora and 40 measurements of radio-ulnae (20 
for each side) across all 20 individuals. 

External measurements with a calliper 

Right and left forearm (from the flexed elbow to the base 
of the outer palmar tubercle) and thigh (distance from 
the vent to the knee) lengths (Watters et al. 2016) of the 
same specimens of Rana temporaria were measured ex¬ 
ternally with a digital calliper (INSIZE, code 1139, resolu¬ 
tion 0.01 mm) (Fig. 1 D) resulting in 40 measurements of 
forearms and 40 measurements of thighs (20 for each side) 
across all 20 individuals. The digital calliper was zeroed af¬ 
ter each measurement. 


Fluctuating asymmetry and measurement error 

Deviations from symmetry are often so small that they 
hardly exceed the magnitude of ME. In order to detect 
meaningful variations in FA between groups of interest, it is 
therefore essential to first assess ME for every single group 
and subtract it from the asymmetry mean square (Gra¬ 
ham et al. 2010). This can be done by a two-way, mixed 
model ANOVA procedure (Lens et al. 2002a, Merila & 
Bjorklund 1995, Palmer & Strobeck 1986). For this pur- 


15 


Stephanie Niemeier et al. 


pose, all measurements were taken twice by the same ob¬ 
server (SN), at different times (minimum period between 
repeated measures: one day) and without access to the first 
measurement when taking the values the second time. 
Thus, in total 80 measurements of each trait per method 
were obtained (2 per trait; 40 for each side; Supplementary 
Table Si). Following the protocol of Palmer and Strobeck 
(2003b) obvious ME outliers that were significantly greater 
than expected due to sampling error were identified using 
the Grubb’s test (Grubbs & Beck 1972), which tests if an 
observation deviates significantly from the sample mean. 
However, because the difference between first and second 
measurement should be zero, we also compared deviations 
from zero. The significance level for the Grubbs test was set 
to p = 0.0125 after Bonferroni correction for multiple tests 
(n r = 4). The Grubb’s test led to the exclusion of one ra- 
dio-ulna measured by calliper and one radio-ulna meas¬ 
ured by pCT. Thus, the final sample size for the ME analy¬ 
ses was n = 40 for femora per method and n = 38 for radio¬ 
ulnae per method. All statistical analyses were performed 
using R (Version 3.5.0; R Core Team 2018). The signifi¬ 
cance level was set to p = 0.05 unless otherwise specified. 

We then applied a two-way, mixed-model ANOVA to the 
repeated measurements for each trait and each method sep¬ 
arately (R package ‘lme4’; Bates et al. 2015). The main fixed 
factor was side (S), which had two levels (left and right). 
The random factor was individual (I) and the side by indi¬ 
vidual interaction (S x I) was a mixed effect. Finally, an er¬ 
ror term (err) represented measurement error (replications 
within side by individual); p-values for the fixed factors 
were obtained by applying R package ‘ImerTest’ (Kuznet¬ 


sova et al. 2017). Significance in the factor side indicates 
the presence of directional asymmetry and thus would in¬ 
terfere with unbiased interpretation of developmental in¬ 
stability (Palmer & Strobeck 1986). The mean square of 
the individual by side interaction is a measure of fluctuat¬ 
ing asymmetry and antisymmetry including measurement 
error. To get unbiased estimates of fluctuating asymmetry 
we extracted the variance components (o 2 m , o 2 SxI , o 2 j) from 
the random effects of the mixed model ANOVAs (Graham 
et al. 2010) and calculated signal (FA)-to-noise (ME) ratios 
(Knierim et al. 2007). The variance component for indi¬ 
viduals (a 2 ) is an estimate of the size variation among in¬ 
dividuals. The variance component for the interaction ef¬ 
fect (o 2 SxI ) is an estimate of the nondirectional asymme¬ 
try (fluctuating asymmetry and antisymmetry). The vari¬ 
ance component for replicates (o 2 err ) is an estimate of ME. 
Between groups comparisons with additional tests should 
only be done, if within groups levels of FA exceed within 
group levels of ME. To estimate the contribution of ME to 
measured phenotypic variation and repeatability of FA re¬ 
sults the MS SxI (mean squares of the side by individual in¬ 
teraction) and MS err (mean squares of the variance of the 
repeated measurements [error]) from the two-way mixed 
model ANOVA were taken to calculate specific FA and ME 
indices (Palmer & Strobeck 2003a): FA excluding ME in 
[mm] (FAioa); ME3 expressing ME as a percentage of the 
total nondirectional asymmetry including ME (MS err as % 
of MS SxI ); ME5 expressing ME as a repeatability coefficient 
that did not describe ME directly, but rather expressed FA 
variation as a proportion of the total between sides varia¬ 
tion, which includes ME. The larger the repeatability, the 



Figure 1. Traits measured for fluctuating asymmetry assessment of the European Common Frog, Rana temporaria (male, ZMB 87968). 
(A) 3D-pCT scan of entire body with (B) right radio-ulna and (C) right femur bone; D) external measures of right radio-ulna and 
femur. 
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smaller the ME is relative to FA. MEi reporting ME in the 
original units of measurement as the average difference be¬ 
tween the repeated measurements in [mm]; FA including 
ME in [mm] (FA4a), MEi as a percentage of FA4a and FAi 
(averaged replicates of |R-L|) mean ± SE in [mm]. From 
each replicated measurement the average was calculated, 
which was then used for further analyses. The final sample 
sizes for the subsequent FA analyses were n = 20 for femora 
per method and n = 19 for radio-ulnae per method. 

To avoid misinterpretation of the FA results, trait-size 
dependency was tested by Spearmans rank correlation be¬ 
tween absolute values of averaged FA replicates (averaged 
|R-F| = FAi) and trait size (averaged (R+F)/2) as an inde¬ 
pendent variable for each trait and each method. The ab¬ 
sence of antisymmetry was validated by examining the fre¬ 
quency distributions of the averaged signed FA replicates 
visually for normality and by using the Anscombe-Glynn 
kurtosis test and DAgostino skewness test (R package ‘mo¬ 
ments’; Komsta & Novomestky 2015). The significance 
level for the tests of other types of asymmetry as well as 
for trait-size dependency was set to p = 0.0125 after Bon- 
ferroni correction for multiple tests (n Groups = 4). To deter¬ 
mine whether the levels of ME (|Mi-M 2| = absolute val¬ 
ues of Measurement 1 - Measurement 2) differed between 
the two methods and morphological traits, we used a 
mixed model ANOVA with the method by trait interac¬ 
tion as fixed effect and individual with random intercept 
and random slopes for the method by trait interaction as 
random effect. Post-hoc analysis was done by conducting 
multiple pairwise comparisons of the estimated margin¬ 
al means with Tukey-adjustment (R package ‘emmeans’; 
Tenth 2018). Absolute values of FA measurements (|R-L|) 
assessed with the two different methods were compared for 
the radio-ulna by Bland-Altman analysis (Bland & Alt¬ 
man 1999), adjusted for repeated measurements. Due to 
proportional bias hyperbolic confidence limits and pre¬ 
diction intervals around the line of best fit of differences 
on averages were constructed by correlated bivariate least 
square regression (Fudbrook 2010) (R package ‘BivReg- 
BFS’; Francq & Berger 2017). We set a difference of 0.1 
mm between methods as acceptance interval. Finally, to as¬ 
sess the effect of method on mean absolute FA (averaged 
replicates of |R-L|) outcome we performed a mixed model 
ANOVA with method as fixed factor and individual as ran¬ 
dom factor. To control for measurement error we also in¬ 
cluded mean absolute ME as a fixed continuous covariate 
into the model. The interaction between the fixed factor 
and the continuous covariate was dropped for better main 
effect estimation because it turned out to be insignificant 
by preliminary model selection tests. 

Results 

Fluctuating asymmetry validation 

Measurement errors of two radio-ulnae (one measured us¬ 
ing a calliper, one using pCT) from two different specimens 
were significant outliers when compared to the mean, with 


one of these even if compared to zero (Grubb’s critical val¬ 
ue for n = 40 after Bonferroni correction (n Groups = 4): 3.24 
(Grubbs & Beck 1972); t G (mean) for outlier^_ ulna csJIiper) 
= 3-46, p 2 _ tau < 0.01; t G (zero) for outlien (mdio ^ calliper) = 3.54, 

P 2 -tail < °- 01 ; l G ( mean ) for 0Utlier2 (rad,„-ul„a ,CT) = 3-59, P 2 _ taU 
< o.oi; t G (zero) for outlier2 (radl? ulna _ pCT) = 3.14, P 2 _ tail < 0.025). 
For this reason, radio-ulnae’ measurements from these 
two specimens were excluded from subsequent analyses 
for the respective method. For the femora measurements, 
no outliers were identified. 

The four single two-way mixed model ANOVAs re¬ 
vealed insignificance in the fixed factor side in all cases, 
indicating the absence of directional asymmetry (Table 1). 
Frequency distributions of the signed FA values (averaged 
replicates of (R+F)) of both traits measured by pCT ap¬ 
peared roughly normal, whereas the distributions of those 
values measured with a calliper appeared left skewed (Sup¬ 
plementary Fig. Si). Neither skewness nor kurtosis did ap¬ 
proach statistical significance after Bonferroni correction 
(Table 2). So neither directional asymmetry nor antisym¬ 
metry was evident in these data. Insignificant Spearman’s 
rank correlation showed that the absolute values of FA (av¬ 
eraged replicates of |R-L| = FAi) of both traits did not de¬ 
pend on trait size (averaged replicates of (R+F)/2) for ei¬ 
ther method (Table 2). 

Method and trait dependent fluctuating asymmetry and 
measurement error 

The mixed model ANOVAs revealed significance in the 
side by individual interaction term for both traits and both 
methods (Table 1). However, variance components (a 2 ) of 
the mixed effects revealed that FA estimates (o 2 SxI ) were 
barely higher than ME estimates (o 2 err ) for measurements 
taken by the pCT, consequently resulting in low signal 
(FA)-to-noise (ME) ratios. For femora measured with a 
calliper o 2 err was about twice as high as o 2 SxI resulting in the 
lowest signal-to-noise ratio. The highest signal-to-noise ra¬ 
tio was achieved by radio-ulnae calliper measurements, 
but direct comparisons of o 2 en with qCT measurements re¬ 
vealed a o 2 for calliper measurements about four times 
higher (Table 1). The descriptors derived from the ANOVA 
results varied between the methods (Table 3). FA exclud¬ 
ing ME (FAioa) was higher for calliper measurements than 
for pCT-based measurements for both traits. The percent¬ 
age of ME to the total non-directional asymmetry (ME3) 
was highest in calliper-measured femora as expected given 
the high error variance (o 2 err ), but was very low in calliper- 
measured radio-ulnae compared to pGT-based measure¬ 
ments. The low value of ME3 in the radio-ulnae measured 
by callipers was reflected in the high repeatability (ME5). 
The average difference between the repeated measure¬ 
ments (MEi) accounted for a third of FA4a (FA including 
ME) for the radio-ulnae measured by calliper against a 
half for pCT-based measurements. However, MEi itself for 
radio-ulnae measured by calliper was about twice as high 
as the values for pCT-based measurements. Generally, de- 
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Table 1. Results from the two-way mixed model ANOVAs (side = fixed effect, individual = random effect, side x individual = mixed ef¬ 
fect) on untransformed repeated measurements for two traits (femur, radio-ulna) and two methods (pCT, calliper) in Rana temporaria. 
Two ME outliers for radio-ulna (one measured by pCT, the other one by calliper) were excluded from these analyses; *** = p < 0.001. 


Trait 

Method 

Source of variation 

df 

Expected 

Variance 

Signal: noise 





mean squares 

component o 2 

ratio 

Femur 

pCT 

Side (S) 

1 

0.0002 





Individual (I) 

19 

49.1862*** 

12.2907 

1.10 



Side x Individual (S x I) 

19 

0.0233*** 

0.0080 



Measurement Error (err) 

40 

0.0073 

0.0073 



Calliper 

Side (S) 

1 

0.5056 





Individual (I) 

19 

79.7089*** 

19.7746 

0.41 



Side x Individual (S x I) 

19 

0.6103*** 

0.1370 



Measurement Error (err) 

40 

0.3364 

0.3364 


Radio-ulna 

pCT 

Side (S) 

1 

0.0036 





Individual (I) 

18 

10.4784*** 

2.6170 

1.36 



Side x Individual (S x I) 

18 

0.0103*** 

0.0038 



Measurement Error (err) 

38 

0.0028 

0.0028 



Calliper 

Side (S) 

1 

0.1811 





Individual (I) 

18 

15.1817*** 

3.7435 

6.07 



Side x Individual (S x I) 

18 

0.2076*** 

0.0959 



Measurement Error (err) 

38 

0.0158 

0.0158 



Table 2. Results of tests for skewness (DAgostino test), kurtosis (Anscombe-Glynn test), and trait size dependency (Spearmans rank 
correlation) for two traits (femur, radio-ulna) and two methods (pCT, calliper) in Rana temporaria. Two ME outliers for radio-ulna 
(one measured by pCT, the other one by calliper) were excluded from these analyses (see Supplementary Table Si). Significance level 
after Bonferroni correction for multiple tests ( n Grou = 4) was set as p = 0.0125; FA1 = |R-L| of the averaged replicate measurements; 
trait size = (R+L)/2 of the averaged replicate measurements. 


Trait 

Method 

n 

mean ± SE [mm] 

FA (R-L) 
Skewness p 

Kurtosis 

P 

Corr. FA1 / trait size 
Spearmans p p 

Femur 

pCT 

20 

0.003 ± 0.017 

0.555 

0.225 

3.286 

0.355 

-0.314 

0.178 


Calliper 

20 

-0.084 ± 0.082 

-0.913 

0.056 

2.410 

0.802 

-0.098 

0.681 

Radio-ulna 

pCT 

19 

-0.014 ± 0.012 

-0.622 

0.185 

4.203 

0.090 

- 0.111 

0.650 


Calliper 

19 

0.098 ± 0.052 

-1.263 

0.013 

4.401 

0.068 

0.098 

0.689 


Table 3. Descriptors of fluctuating asymmetry (FA) and measurement error (ME) in Rana temporaria , derived from the results of the 
two-way mixed model ANOVAs side = fixed effect, individual = random effect, side x individual = mixed effect) on untransformed 
repeated measurements for two traits (femur, radio-ulna) and two methods (pCT, calliper). Two ME outliers for radio-ulna (one 
measured by pCT, the other one by calliper) were excluded from this analysis. FAlOa = 0.798V(MS SxI - MS err ); ME3 = MS err as a 
percentage of MS SxI ; ME5 = (MS SxI - MSJ/[MS SxI + (2 - 1)MSJ; ME1 = 0.798VMS err ; FA4a = 0.798 VMS SxI 7fA 1 = |R - L|"of the 
averaged replicate measurements. 


Trait 

Method 

FAlOa 

[mm] 

ME3 

Repeatability 

(ME5) 

ME1 

[mm] 

FA4a 

[mm] 

ME1 as 
% FA4a 

FA1 mean ± SE 
[mm] 

Femur 

pCT 

0.101 

31.37 

0.52 

0.068 

0.122 

56.0 

0.122 ± 0.019 


Calliper 

0.418 

55.12 

0.29 

0.463 

0.623 

74.2 

0.578 ± 0.100 

Radio-ulna 

pCT 

0.069 

26.85 

0.58 

0.042 

0.081 

51.8 

0.075 ± 0.015 


Calliper 

0.349 

7.61 

0.86 

0.100 

0.364 

27.6 

0.344 ± 0.070 


scriptors of FA and ME differed between traits within the 
calliper method, whereas results for both traits within the 
pCT methods were comparable. 


This was statistically supported by the results of the sub¬ 
sequent mixed-model ANOVA on levels of measurement 
error (|ME| = |Mi-M 2|) (Fig. 2A). ME was significantly dif- 
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ferent between methods (F = 26.28, p < 0.001), traits (F = 
19.92, p < 0.001). The interaction between both factors 
was also significant (F = 15.66, p < 0.001), meaning that 
both methods affect the discrepancy in measurement er¬ 
ror between both traits to different extents. Pairwise com¬ 
parisons of the estimated marginal means (emmeans ± SE) 
with Tukey-adjustment revealed that the |ME| for femora 
calliper measurements (0.626 ± 0.102) were significantly 
higher than |ME| for all other measurements (radio-ulna_ 
calliper 0.135 ± 0.035, P = 0.001; femur_pCT 0.090 ± 0.034, 
p < 0.001; radio-ulna_pCT 0.056 ± 0.036, p < 0.001), 
whereas |ME| for pCT-based measurements of both traits 
and for calliper measurements of radio-ulnae were not sig¬ 
nificantly different from each other (all p > 0.42) (Fig. 2A). 
Since the ME for femora measured with a calliper was ex¬ 
ceptionally high we excluded that trait from further FA 
analyses. As revealed by the mixed-model ANOVA on 
fluctuating asymmetry (|FA|= averaged replicates of |R- 


L| = FAi), FA outcome was significantly different between 
methods (F = 7.09, p < 0.05). Comparison of the estimated 
marginal means showed significantly higher |FA| values for 
calliper measurements (0.316 ± 0.054) than for pCT meas¬ 
urements (0.103 ± 0.054) (Fig- 2B). 

The correlated bivariate least square regression of the 
differences in absolute FA values (|R-L|) between meth¬ 
ods on their respective averages was: Difference = 2.03- 
0.18 (slope SE ± 0.203, p < 0.0001; intercept SE ± 0.053, 
p < 0.01). There was proportional bias present, i.e. the dif¬ 
ference in values resulting from two methods increased 
in proportion to the average values. This is indicated by 
the significant departure from zero of the slope of the least 
squares linear regression. For averages of the methods up 
to 0.12 mm differences between the two methods fell with¬ 
in the acceptance interval of 0.1 mm. I Iigher average values 
led to overestimation of absolute FA values by the calliper 
method (Fig. 3). 


B 
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llT 


ro 0.4 
0) 

E 

E 
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0 . 2 - 


0 . 0 - 


trait: femur 


calliper pCT 




Figure 2. (A) Variations in measurement error |ME| and (B) fluctuating asymmetry |FA| for calliper and pCT measurements of femora 
and radio-ulnae of Rana temporaria; statistically significant differences (P < 0.05) are indicated by non-overlapping arrows, grey bars 
are confidence intervals for the estimated marginal means (emmeans) (mixed-model ANOVAs, for details on calculations and design 
of the models see text); |ME|: n_ = 40 for each method, n„ ,. , = 38 for each method. |FA|: n„ ,. , = 19 for each method. 
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Discussion 

Our study emphasizes that the outcome of fluctuating 
asymmetry (FA) analyses is substantially influenced by 
the method applied. As this has been shown before, it was 
recommended to rely on the signal-to-noise ratios and re¬ 
peatability to obtain meaningful FA results (Knierim et al. 
2007). In our study, however, signal-to-noise ratio and re¬ 
peatability, appeared to be adequate for calliper measure¬ 
ments of the radio-ulnae. However, only the direct com¬ 
parison between methods revealed that ME was higher for 
calliper measurements. When ME is a sizeable fraction of 
FA, the confidence in estimates of FA is lowered, even when 
FA is significantly larger than ME statistically (Palmer & 
Strobeck 2003b). Although the difference in ME among 
methods for radio-ulnae was not significant, the difference 
in results of FA between methods clearly was. An increas¬ 
ing ME artificially inflates FA or obscures FA variation, re¬ 
sulting in false conclusions about developmental instabili¬ 
ty (Palmer 1994, Palmer & Strobeck 1986). Additionally, 
ME may differ among traits for several reasons and there¬ 
by again lead to the distorted impression of differences in 
FA among traits. For instance, ME tends to increase with 


decreasing size of the character for simple allometric rea¬ 
sons (Pankakoski et al. 1987). Likewise, imprecisely de¬ 
fined start- and end-points of measures on a trait or simply 
the nature of the trait (soft tissue vs. rigid organs) can lead 
to variation in ME (Van Dongen 2015, Van Nuffel et al. 
2007). Furthermore, the accessibility of some traits maybe 
lower than of others and thus impede repeatable position¬ 
ing of instruments (Knierim et al. 2007). Considering the, 
albeit not significant, higher ME for calliper measurements 
the level of associated FA seemed to be inflated also for the 
radio-ulnae even after correction for ME, and consequent¬ 
ly would lead to false interpretations about developmental 
instability. In addition the skewness in the frequency distri¬ 
butions of signed FA values indicated a tendency towards 
directional asymmetry in both traits measured with a calli¬ 
per. This could reflect a bias caused by handedness (Brown 
& Brown 2002). Measurements on soft tissue of living or 
dead animals are more prone to handling bias than osteo- 
logical measurements, because the pressure applied during 
handling and/or measuring may alter the exact position of 
the measuring points and consequently induce directional 
asymmetry (Helm & Albrecht 2000). In a recent study it 
has been shown that osteological pCT measurements are 



Figure 3. Bland-Altman plot illustrating differences between absolute values of fluctuating asymmetry for radio-ulnae as measured by 
calliper compared to pCT with hyperbolic confidence limits (Cl) and prediction intervals (PI) around the line of best fit (correlated 
bivariate least square regression - CBLS); acceptance interval = 0.1 mm; n = 36. 
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more precise than external manual measurements in de¬ 
tecting sexually dimorphic characters for the same reasons 
(Pogoda & Kupfer 2018). The difficulty of obtaining re¬ 
peatable morphometric measurements of external charac¬ 
ters is also known from taxonomic studies. Reasons vary 
from inter-observer ME, over preservation effects, to in¬ 
consistent descriptions of anatomical features (Bernal & 
Clavijo 2009, Stephens et al. 2015, Vervust et al. 2009, 
Watters et al. 2016). 

Only for one trait, the radio-ulna, external calliper 
measurements could overcome the challenge of achieving 
a FA caused between-sides variation which exceeded the 
variation due to measurement error (ME). Ignoring ME 
of the calliper measurements would create the impression 
that FA levels were higher in the femora than in the radio¬ 
ulnae. Consequently, FA calliper results for femur were un¬ 
reliable. However, to increase the reliability of FA as a de¬ 
tector of stress, measurements from single traits can also 
be combined by forming a composite index of FA (CFA) 
(Feung et al. 2000, Palmer & Strobeck 2003a). This ap¬ 
proach is only valid if ME is comparable among traits. 
Contrary to the calliper method, ME among traits meas¬ 
ured by pCT yielded very similar values and could, there¬ 
fore, be used for multiple traits-analyses. 

That pCT was the method with much higher accuracy 
was most convincingly shown by its low ME. Therefore, 
we set pCT as reference method for the Bland-Altman 
analysis, which also substantiated that the calliper meas¬ 
urements overestimated FA values. Furthermore, calliper 
measures led to proportional bias due to the high measure¬ 
ment values arising from this method. We assume that, in 
case a high accuracy method fails to detect an effect, but a 
low accuracy method detects one, the latter result is likely 
just a methodological artefact. Our sample size was rela¬ 
tively small, but as shown by our analyses as well as in e.g., 
Munoz-Munoz & Perpinan (2010), still sufficient to de¬ 
tect reliable differences between methods in precision and 
outcome. This was in particular due to the fact that meas¬ 
urements on exactly the same individuals were opposed 
with both methods. When ME is relatively high, an alter¬ 
native to a very sensitive method to increase accuracy, is to 
increase sample size and/or the number of repeated meas¬ 
urements on each sample. However, when ME is small, 
smaller sample sizes and less repeats would be sufficient to 
detect variation of FA between groups of interest. For that 
reason, sensitive methods, such as pCT, might be of par¬ 
ticular interest in cases where the availability of individu¬ 
als is limited, e.g. threatened species (Van Dongen 1999). 

A further benefit of the pCT method arises from its 
non-destructive character. It makes it suitable for the appli¬ 
cation to valuable preserved museum specimens, thereby 
also avoiding measurement error that arises from external 
measurements. The usage of museum collections provides 
a great opportunity to compare levels of developmental in¬ 
stability before (baseline) and after an environmental im¬ 
pact (Fens et al. 1999, Schmeller et al. 2011). FA is a rela¬ 
tive estimator of developmental instability because there is 
no standard or reference value of asymmetry that indicates 


stability. Any conclusions about the level of instability in a 
given population can only be made by the comparison with 
a control or reference population (Clarke 1995).% avoid 
unnecessary disturbance of populations pCT could even 
be applied to carcasses originated from road kills or preda¬ 
tion, at least as the respective characters of interest are un¬ 
damaged. A huge disadvantage of the pCT-based method, 
however, is its higher costs compared to the classic calliper 
method, it is more time-consuming, and only applicable 
to sacrificed or anaesthetized animals after they have been 
brought to the lab. Hence it cannot represent a standard 
technique to monitor e.g., population declines of species in 
the field (Alford et al. 1999). However, the probability to 
detect significant FA in living animals in the field seems to 
be low anyway due to the associated high ME, which limits 
the use of FA as an indicator for environmental stress and 
population health in studies with living animals (McCoy 
& Harris 2003). 

In conclusion, our study shows that if fluctuating asym¬ 
metry should fulfil its goal to serve as an effective tool in 
the conservation of amphibians and other endangered ani¬ 
mals, results based on calliper measurements of external 
traits, especially those involving soft tissue, should be treat¬ 
ed with caution and if possible, more accurate methods, 
such as pCT, should be preferred. 
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Supplementary material 

1 Supplementary Figure and 1 Supplementary Table: 

Figure SI. Frequency distributions of averaged replicate measure¬ 
ments of signed FA (R-L) for each trait and each method. 

Table SI. Raw data and background information for each Rana 
temporaria individual used in the study. 

Appendix A 

State of the art: Fluctuating asymmetry in amphibians 

For an overview concerning FA in amphibians, we searched the 
‘Web of Science’ (Web of Knowledge, Berlin, 10-16-2017) for ar¬ 
ticles using a minimum combination of two of the following key¬ 
words: ‘fluctuating asymmetry’ and ‘amphibian*’, ‘developmental 
*stability’ and ‘amphibian*’. We also searched the reference lists of 
the selected articles for additional studies that met our inclusion 
criteria. Our search included studies that: (1) used amphibian 
species, (2) addressed the question: ‘does a certain environmen¬ 
tal stressor affect FA?’, and (3) measured FA in bilateral metric 
(i.e. measurable distance) traits. In total, we found 23 publications 
analysing fluctuating asymmetry (FA) in bilateral metric traits of 
amphibians. Although it has been indicated before that measure¬ 
ments of morphological characters with manual methods are of¬ 
ten imprecise (Munoz-Munoz & Perpinan 2010, Van Nuffel 


et al. 2007), especially when applied on external instead of skel¬ 
etal characters (Tuyttens et al. 2005), only three of these studies 
used computerized techniques on skeletal traits. Five studies used 
computerized methods but on external traits. Two studies used 
microscopes on external traits. The remaining 13 studies used 
calliper measurements on external traits leading to inconsistent 
results regarding the association of FA and environmental stress 
(summarized in Table Appendix A). 

As apparent from the overview, six of 23 studies found a posi¬ 
tive association between the degree of fluctuating asymmetry 
with environmental stress. However, nine studies did not reveal 
such association and the remaining eight studies found either a 
positive, a negative or no association depending on the investi¬ 
gated trait. Altogether there were 17 cases that did not detect a 
positive association of the degree of fluctuating asymmetry with 
environmental stress. In seven of these 17 cases, this was due to 
high measurement error (ME), directional asymmetry, kurtosis 
or other unfulfilled preconditions to detect FA. Six of these seven 
studies used external calliper measurements. In three out of the 
14 remaining cases, where a positive correlation was found, ME 
was not assessed (or reported), thereby leaving the interpretation 
of the results questionable. 

Our literature summary underlines the dependency of FA out¬ 
come on trait choice and highlights the inconsistency in results 
related to calliper measurements. Despite the weaknesses of man¬ 
ual measures taken from external characters, this is still the most 
commonly applied approach. 


Table Appendix A. Summary of publications dealing with the effects of environmental stress on fluctuating asymmetry in bilateral 
metric traits of amphibians. 


Taxon 

Trait 

Location Method Measure¬ 
ment 

Stressor 

Positive 
correla¬ 
tion of 
FA with 
stress 

Reason for lack Notes 
of correlation 

Reference 

Litoria nannotis, 
Litoria genimaculata 

hind limbs, forelimbs 

external calliper 

manual 

rising 

temperature 

yes 


(Alford et al. 
2007) 

Pelophylax perezi 

humerus, radio-ulna, 
metatarsal, CFA 

tibio-fibula 

skeletal 

X-ray 

computer 

habitat 

alteration 

yes 

no 

no differences 

(Burghelea 
et al. 2013) 

Physalaemus cuvieri 

nostril-snout distance, 
eye width 

external 

images 

computer 

pesticides 

yes 


(Costa & 
Nomura 

2015) 

Eleutherodactylus 

antillensis, 

Eleutherodactylus 

coqui 

femur, tibio-fibula, 
radio-ulna 

skeletal 

X-ray 

computer 

urbaniza¬ 
tion, habitat 
alteration, 
agriculture 

no 

no differences 

(Delgado- 
Acevedo & 
Restrepo 

2008) 

Bufo americanus, 
Hyla chrysoscelis 

eye width, eye-nostril 
distance, radio-ulna, 
tibio-fibula, calcaneum 

external 

images 

computer 

toxicant 

(nitrate) 

no 

no differences 

(Earl & 

Whiteman 

2009) 

Hyla chrysoscelis 

eye width, eye-nostril 
distance 

external 

images 

computer 

toxicant 

(phosphate) 

no 

no differences 

(Earl & 
Whiteman 
2010) 

Physalaemus cuvieri 

digit 

femur, tibio-fibula, 
radio-ulna 

external 

calliper 

manual 

urbanization 

yes 

no 

measurement 
error (ME), 
directional 
asymmetry 
(DA), kurtosis 

(Eisemberg 
& Bertoluci 
2016) 

Bokermannohyla 

saxicola 

eye-nostril distance 

external 

micro¬ 

scope 

manual 

urbaniza¬ 
tion, habitat 
alteration, 
agriculture 

no 

no differences 

(Eterovick et 
al. 2015) 
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Taxon 

Trait 

Location Method Measure- 

Stressor 

Positive 

Reason for lack 

Notes 

Reference 




ment 


correla¬ 
tion of 
FA with 
stress 

of correlation 



Bokermannohyla 

femur, tibio-fibula, 

external calliper 

manual 

urbaniza- 

no 

ME, DA, 

indication of 

(Eterovick et 

saxicola 

radio-ulna, eye-nostril 



tion, habitat 


no differences 

correlation of FA 

al. 2016) 


distance 



alteration, 



with heterozy- 






agriculture 



gosity 


Rana pipiens 

deformed radio-ulna, 

external calliper 

manual 

agriculture 

no 

DA, kurtosis, 


(Gallant 


and normal radio-ulna, 





size dependence, 

& Teather 


tibio-fibula, femur 
deformed femur, 




yes 

no differences 

but DA, kurtosis, 

2001) 


tibio-fibula 





and size depend- 









ence 


Bufo bufo 

forearm, tibia 

external calliper 

manual 

agriculture 

yes 


no recording of 

(Guillot et al. 








ME 

2016) 


parotid gland length 




no 

no recording 




and width 





of ME or other 









preconditions, 
no differences 



Crinia signifera 

forearm, phalanges, 

external micro- 

manual 

habitat 

no 

negative correla- 


(Lauck 2006) 


femur, tibio-fibula 

scope 


alteration 


tion 



Agalychnis callidryas, femur, tibio-fibula 

external calliper 

manual 

urbaniza- 

no 

negative cor- 


(Matias- 

Dendropsophus 




tion, habitat 


relation, no 


Ferrer & 

ebraccatus 




alteration, 


differences 


Escalante 





agriculture 




2015) 

Ambystoma 

hind limbs (knee to 

external calliper 

manual 

low pH 

no 

ME, negative 


(McCoy & 

maculatum 

tip of toe), forelimbs 





correlation, no 


Harris 2003) 


(olecranon process to 
tip of digit) 





differences 



Bufo fowleri, 

hind limbs 

external calliper 

manual 

pathogen 

yes 


but no recording 

(Parris & 

Hyla chrysoscelis 







of ME 

Cornelius 

2004) 

Lithobates pipiens 

tibio-fibula 

external calliper 

manual 

habitat 

yes 



(Reeves et al. 


radio-ulna, thumb, 



alteration 

no 

preconditions 


2015) 


femur, foot 





not fulfilled 



Notophthalmus 

hind limbs 

external images 

computer 

pathogen 

yes 



(Sherman et 

viridescens 








al. 2009) 

Crinia signifera 

hind limbs, forelimbs 

external calliper 

manual 

urbanization 

yes 



(Sievers 2017) 

Rana arvalis 

femur, tibio-fibula, 

skeletal micro- 

computer low pH 

yes 



(SODERMAN et 


humerus, radio-ulna 

balance 



no 

ME 


al. 2007) 


ilium 








Rana clamitans 

femur, tibio-fibula, 

external calliper 

manual 

pathogen 

yes 


higher levels of 

(St-Amour et 


foot, humerus, radio- 






FA for individu- 

al. 2010) 


ulna, thumb, horizontal 





als infected with 



and vertical tympanum 






Ranavirus 



femur, tibio-fibula, foot, 



no 

no differences 

no increase of FA 



humerus, radio-ulna, 






levels through 



thumb, horizontal and 






fungus Batracho- 



vertical tympanum 






chytrium 

dendrobatidis (Bd) 


Rana arvalis 

thigh, crus, rostrum, 

external calliper 

manual 

pollution, 

no 

neg. correlation, 


(Vershinin et 


eye, digit, heel tuber 
temporal spot 



urbanization 

yes 

no differences 

but only in 
ethanol fixed 
individuals 

al. 2007) 

Litoria wilcoxii/ 

tibio-fibula 

external calliper 

manual 

pathogen 

no 

no differences 


(WOODHAMS & 

jungguy, 

Litoria nannotis, 








Alford 2005) 

Litoria genimaculata, 
Nyctimystes dayi 









Ambystoma 

spot area 

external images 

computer pesticides, 

yes 


no recording of 

(Wright & 

maculatum 




habitat 



ME 

Zamudio 





alteration 




2002) 
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